People Overload!

Understanding Population Density in certain Countries

EXECUTIVE SUMMARY

The World’s Population is growing every second at the same time the clock is ticking. As of November 2020, the world’s population is estimated to be around 7.8 Billion. With each second having a new birth, there is always a possibility of overpopulation in a lot of areas in the world. For a country, it is very crucial to determine the statistics behind its population.

Population density is the measurement of population per unit area. It simply refers to the number of people living in an area like our example of a household. This Research will be tackling on the different population density values around 3 Countries which are Philippines, Japan, and South Korea.

The research paper will be comparing the population density values for these countries by visualizing them in the population density map. Different density maps differentiating different demography per country will also be shown.

Using the results, the research paper was able to conclude significant insights. It was observed that for Japan and Korea, there is a large discrepancy of women population density versus men population density. The research paper was able to show also that Philippines outweighs Japan and Korea in terms of bulk population density.

INTRODUCTION

As of November 2020, the world’s population is estimated to be around 7.8 Billion. With each second having a new birth, there is always a possibility of overpopulation in a lot of areas in the world. Based on the worldometers info [1], the top country having the highest population is China with a population of almost 1.4 Billion, which is almost 18% of the world's population. It can be noted also that, Philippines is within the top 20 largest countries and is placed in the 13th place, having an estimated 110 Million for its current population. It is also interesting that Japan, a country which has a smaller area than the Philippines, is in the 11th Place having 126 Million for its population[1].

For a country, it is very crucial to determine the statistics behind its population. One of the most important factors in considering the economic life of a country is checking the country’s growth in terms of population. One thing to note is that, when there are a lot of people in certain area, all other resources will either in decrease or in increase. If there would be a large number of people living in certain household, chances are that the utilities like water, food, electricity, and internet usage would be much higher than a smaller number of people. In the case of Tokyo, Japan, rent is notoriously high due to the population density of the city. At the same time, the available resources for such utilities would decrease which is the law of supply and demand. On the other hand, having population density concentrated in certain areas leaves nature untouched by people in others, maintaining monther nature's beauty for us to appreciate, which is the case in many areas of our country, the Philippines. This could also aid in a country's economy by attracting travelers by developing the tourism industry. Therefore, countries are very keen on checking the population growth rate in different areas within their jurisdiction.

One way of checking the effect of population is checking the population density. Population density is the measurement of population per unit area. It simply refers to the number of people living in an area like our example of a household. In checking the population density, there are certain factors to consider. A lot of countries check if their population density is too high, which might lead to overconsumption of resources and diminishing supply. Countries can also check if their population density in some areas is too low, indicating lagging development in sparse areas. Indeed, population density has a role in economic development and perhaps further studies can be studied in this aspect[2].

The High Resolution Population Density from Open Data in AWS's Open Data Registry is a set of population density data for a selection of countries from Facebook Connectivity Lab and Center for International Earth Science Information Network – CIESIN – Columbia University. This dataset estimates the number of people living within 30-meter grid tiles, and will be used to answer the question How is the population density of certain demographics distributed in Japan, South Korea and the Philippines?

PROBLEM STATEMENT

1. How is the population density of certain demographics distributed in Japan, South Korea, and Philippines?

2. How does the Philippines compare to Japan and Korea in terms of population density?

BUSINESS VALUE
METHODOLOGY

To properly address the problem, the researchers will use a portion of the Facebook - CIESIN dataset. Data from select countries such as Japan, South Korea, and the Philippines will be used. Some information such as the country and demographic are in the file name so columns will be added during processing. The researchers will follow the general workflow defined below to arrive at a conclusion and recommendations.

Figure 1: A general work flow of the study image.png

OVERVIEW OF THE METHODOLOGY

Each step will be discussed in detail in the following sections. To give a general overview of the methodology, a brief description for each step is described below:

1. Data Gathering

The filepath for the dataset are as follows:

    aws s3 ls s3://dataforgood-fb-data/ --no-sign-request

Documentation Link:

    https://dataforgood.fb.com/docs/

Please note that there are more files that can be used for the project under the chosen dataset but to minimize the scope of the research, only files for select countries will be used. The file sizes were checked by downloading and decompressing them locally, arriving at a total of about 11GB.

2. Data Preprocessing:

For this dataset in particular, no cleaning and no transformation was required. Multiple files were merged to arrive at the final complete dataset.

3. Data Description

4. Exploratory Data Analysis

6. Interpretation of Results

DATA GATHERING

The data used for the study was sourced from the AWS Open Data Registry, and can be found by searching for the data titled "High Resolution Population Density Maps + Demographic Estimates by CIESIN and Facebook". It contains almost 27GB of files, many of which are CSV files that have coordinate and the population density columns. The dataset has population density data for many countries. To minimize this research, we would be only focusing on the following:

1. Philippines
2. South Korea
3. Japan

For this research, we will be focusing on Japan and South Korea and will be also comparing it to the Philippines since it has almost comparable population and location, but different land masses.

The data for Philippines, South Korea and Japan have several folders and CSV files that provide population densities for different demographics within the country. The different categories of demographics are the following:

1. Childer under Five
2. Elderly (60 Yrs Old+)
3. Men
4. Women
5. Women of Reproductive Age 15-29
6. Youth Age 15-24
7. Total Population

Statistics for each of these categories will be shown. Since these are population density data, the focus of the visualizations will be on maps. Note that the total size of the files is around 11GB of uncompressed CSV files so the task was accomplished using a Dask cluster of 3 8GB t2.large instances for workers, scheduler, and client.

Philippine Filepath

Japan Filepath

Korea Filepath

DATA PREPROCESSING

Since the dataset can be considered big due to the total size of the files and the sheer number of rows and columns, a Dask cluster is used to perform the preprocessing. It can seen that there are several folders for each of the three countries. These folders correspond to each demographic category. To process the data, the data will be appended into one consolidated dask dataframe which will have a specific column for the demographic and the country it belongs. In this way, a single dataframe of the whole dataset will be obtained.

The data processing infrastructure is composed of a Dask Cluster created in AWS consisting of a client, a scheduler, and 3 workers. EC2 instances of type t2.large with 8GB storage was used.

The command below will connect the notebook to the existing dask cluster and also some codes to preprocess the files.

Dask Cluster

Reading the CSV Files

The name of each file to be read is listed. New columns such as the country and the demographic will be added as new columns to the files. Using the dask cluster, we will create a single dataframe containig all the data.

DATA DESCRIPTION

The dataset contains the following columns and their descriptions.

Philippine, Japan and South Korea Population Density Dataset:

Column Name Description
latitude The latitude (EPSG:4326/WGS84) coordinates of the center of the 1-arc-second-by-1-arc-second grid cell
longitude The longitude (EPSG:4326/WGS84) coordinates of the center of the 1-arc-second-by-1-arc-second grid cell
population The value is the (statistical) number of people in that grid of coordinates
country Name of Country
demographic Demography/Division of Population

We will be puting the data on to the variable df, since we will be using this dataset uniformly throughout the research paper. Persisting will make the processing on the dataset much faster.

The First Dataframe will also exclude the total_population demographic to allow us to visualize the various demographic categories in each location.

The second DataFrame will be only including the total_population demographic to allow us to visualize the latter.

Exploratory Data Analysis

Plotting the Population Density Values

Plotting Philippines, South Korea and Japan

Given the sheer number of rows in the data, a small sample is taken to allow us to visualize and have a glimpse of the information. We will be only checking a fraction of the dataset around 0.005% since parsing all through it would take a long time

This visualization shows all countries included in the dataset, with a sample of their respective population densities. Some areas are less populated than others while the most dense areas are usually near the capital city of each country.

A view of each country's population density distribution follows.

Plotting Japan

In Japan, the most dense areas aside from Tokyo seem to be Osaka and Nagoya. Aside form that, there seems to be an outlier in the city of Matsue where there seems to be a dense conventration of women. The population distribution covers the whole country except in some mountainous regions. The most dense areas concentrate around where the shinkansen passes.

Plotting South Korea

In South Korea, the most dense area aside from Seoul is Busan. Aside form that, there are also a concentration popuation in Daegu and Gwangju. The population distribution covers the whole country except for the northeast region, which is also a mountainous region. Interestingly, the Korail also connects the most populated areas in South Korea.

Plotting Philippines

In the Philippines, the densely populated areas are limited to Metro Manila and Davao. An outline of the country can't be clearly seen. Most dense areas outside of Manila and Davao are composed of men, with the exception of Sulu which has a concentration of women. Compared to the previous two countries, we don't have a national railroad system. Also compared to the previous two countries, the disparity between the places where people congregate and the rest of the country is apparent, with the markers in other places almost invisible.

In the first map showing the three countries, the shape of the Philippines can be seen just from looking at the markers but now it's not visible at all. Why is that? It is because the high densities in the populated areas generate large circle markers relative to other countries. Now that we have isolated the local data, we see that there are actually only a few places generating those markers and relative to the two other countries, the population density in these areas in the Philippines is so much greater. It is also interesting to note that the variety in color is less, with the dense populations composed mostly of men.

After increasing the sample data by four times, even though there are more markers, the coverage is still greatly uneven. Most of them are small, almost invisible without zooming into the map. Visually, there's hardly any noticeable difference. We can conclude that most of the data shows a very low density.

With this, it could be said that the development in the Philippines is highly concentrated in a few cities, leaving the rest of the country undeveloped. Some of these undeveloped areas can take advantage of their natural resources for tourism but there might be a lack of infrastructure or facilities due to the sparse population.

Statistical Measures of Per Demographic

From the table above, we can derive several data analysis that will show also the comparison between each demography between each country. This will help generalize and conclude for this research. Some of the of initial findings are between the 3 Countries, Philippines has the highest maximum and standard deviation of population, especially between men and women comparing the values to South Korea and Japan. Some of major insights will be seen and discussed in the results and discussion.

Statistical Measures of Per Total Population

From the table above, we can derive some data analysis that can compare the total population density between each country. Some initial Insights can be seen is that Philippines Value of its statistical measure greatly outweights its counterpart for the two other countries. General data analysis will be shown on the results and discussion.

RESULTS AND DISCUSSION

To answer the problems at hand, we were able to plot the total population density of each country and visualize it perfectly. The first Plot/Graph in this paper of Japan, South Korea and Philippines can be one of the most significant findings for this research. In First graph, we can see that the total population visualization for Philippines greatly outweighs the visualization for South Korea and Japan. We can also see that for each country, there are a certain area where the population density is centered and are coupled to each other. These areas are very important to consider for each country since they can generate development and economic activity. The first graph also shows in what area are the population density the lowest. Using this visualization, they can target their projects and the government aids to those important area. From this point of view, we can already derive that the population density in the Philippines is much greater than South Korea and Japan.

JAPAN

The Japan Graph shows the different levels of population density for each demographic in our dataset. From the visualization we can derive some very important analysis. We can see initially that there is big chunk of women’s population density near Matsue. We can note also that this visualization of the women is the largest and the most visible among the visualization and this can symbolize that there are a lot of women in Japan specially on this specific area. Also, we can see that there is a part in Japan area where many of the demographics are coupled together. The most evident demographic on this part is the youth(15-24) which can be young professionals and high schoolers. They are coupled together with the mass of the men in this area. If you look geographically, that certain area is Tokyo which is the capital of Japan. It just concludes that Tokyo is where all the office workers lives and specially the schools. We can see also that the elder(60+) is very evident in the graph, specially in the lower part of japan which can mean that there are a lot of this demographic across japan.

SOUTH KOREA

The South Korea Graph shows the different levels of population density for each demography in our dataset. From the visualization we can derive some very important analysis. We can see initially that there is a demography that is very evident across South Korea Graph. This are the youth(15-24). Men, women of reproductive age. We can see in the graph that these three primary demography are always couple in with each other and looks like a dense cluster of population. This mean that city belonging to these areas are highly profitable since this are middle workers and young professionals. We can also see that comparing to the other countries, the visualization of South Korea is more evident and shows denser cluster of population density specially at the north part and south of South Korea. When looking at the map geographically, it symbolizes the major cities in South Korea, specially Incheon and Cheonan in the north area while Daegu and Busan in the south part. This means that in this city, there is a big mix of youth, men and women who are on the starting age of around 15-30 years old.

PHILIPPINE

The Philippine Graph shows the different levels of population density for each demographic in our dataset. From the visualization we can derive some very important analysis. We can see that we were able to show two graphs of the Philippines which have different fractional samples. We can generally say that there are two significant areas in the Philippines where the central population density can be seen. This is the Manila Area in Luzon and the Davao Area in Mindanao. It can be seen that the demographic for these two areas are evenly mixed between men and women since this are one of the central working area in the Philippines. Unlike Japan which there are a lot of elderly demography, Philippines does not show elderly demography as much as compared to Japan. Comparing the Philippine visualization to the South Korea Graph, Philippines has a few clusters of population density comparing to South Korea which are evenly distributed.

STATISTICAL MEASURES per Demographic

  1. JPN – Highest mean can be seen on the women demographic
  2. JPN – Maximum population density can be seen on the women demographic also
  3. JPN – Highest standard deviation and variance can be seen on the women demographic also
  4. KOR – Highest mean can be seen on the women demographic
  5. KOR – Maximum population density can be seen on the women demographic also
  6. KOR - Highest standard deviation and variance can be seen on the women demographic also
  7. PH – Highest mean can be seen on the men demographic
  8. PH – Maximum population Density can be seen on the women demographic
  9. PH - Highest standard seviation and variance can be seen on the Women demographic

STATISTICAL MEASURES per Total Population

  1. Highest Mean is seen in Philippines
  2. Highest Max Population Density in seen in Philippines also
  3. Highest Standard Deviation can be seen in Philippines also
CONCLUSION AND RECOMMENDATION

A dataset from Open Data Registry of AWS entitled High Density Population Maps Dataset was parsed and underwent data analysis, data preprocessing and further filtering. To answer the problem statement, exploratory data analysis was conducted and showed different visualizations of the population density for each country and demographic. Several comparisons and statistical measures where are made to distinguish the uniqueness of each demographic and how it might possibly affect the country. The results show the following conclusions: 1. Japan has large number of Elderly demographic 2. South Korea has a large number of mixed clusters of young professionals which are evenly seen on the men and women. 3. Both Japan and South Korea have a high number of women population based on the statistical measurement results. 4. Philippine population greatly outweighs the two other countries. 5. Philippines has a lot more of the men demographic than women. 6. Philippine population is more concentrated in few areas compared to Japan and South Korea. Additional recommendations are given also to further improve the research. Since the population density values are already usable in this dataset, no further extrapolation can be done other than statistical measures and visualization. It is recommended to augment this population density dataset with other information that can be seen in other dataset to provide more context or supplement each other as features or labels in machine learning models.

REFERENCES AND ACKNOWLEDGEMENTS

[1] https://www.worldometers.info/

[2] Yegorov, Yuri. (2015). Economic Role of Population Density. https://www.researchgate.net/publication/283637652_Economic_Role_of_Population_Density Accessed 29 Nov 2020.

[3] Facebook Connectivity Lab and Center for International Earth Science Information Network – CIESIN – Columbia University. 2016. High Resolution Settlement Layer (HRSL). Source imagery for HRSL © 2016 DigitalGlobe. https://dataforgood.fb.com/docs/high-resolution-population-density-maps-demographic-estimates-documentation/ Accessed 29 Nov 2020.